home *** CD-ROM | disk | FTP | other *** search
-
- I sent this to tim a while ago, but I don't think
- he's had time to look at it.
-
- Meanwhile, libWWW is becomming reentrant, but I still
- think the architecture is kinda clumsy: you have to
- have a big data structure describing the DTD, and
- a routine for each element, etc.
-
- This doesn't mesh well with the MidasWWW architecture, which
- can read the DTD from the X resource database at
- runtime.
-
- I have an idea for an architecture that the linemode and
- MidasWWW could share (along with other new implementations).
-
- It's not radically different from the current libWWW, but
- there's a lot of grunt-work between the current libWWW
- and what I've got here. But I think the end result would
- be much more usable.
-
- We start with the HText class. In stead of the various
- style and append methods, we have four methods in a
- virtual function table:
-
- typedef struct{
- int (*start_tag) PARAMS((SGML_Object this, CONST char* gi,
- CONST char** attributes, int nattrs));
- VOID (*end_tag) PARAMS((SGML_Object this, CONST char* gi));
-
- VOID (*entity) PARAMS((SGML_Object this, CONST char* name));
-
- VOID (*data) PARAMS((SGML_Object this, CONST char* data, int char_qty));
- }SGML_DocClass;
-
- The linemode would declare something like:
-
- SGML_DocClass griddoc = {HText_start_tag, HText_end_tag,
- HText_entity, HText_data};
-
- The HText implementation is responsible for keeping track of
- the stack of open elements, if it needs to.
-
- On top of these we build some format parsing routines:
-
- SGML_parse(void* dest, void* closure, void* stream, int (getc)(void*));
- /* psuedocode:
- int read, content;
- char buffer[1000];
- SGML_DocClass *docclass = (SGML_DocClass*)closure;
-
- while( (read = SGML_read(buffer, content, stream, getc)) != EOF){
- switch(read){
- case SGML_start_tag:
- ... parse name, attributes ...
- content = (docclass->startTag)(dest, name, attrs);
- if(content = empty){
- (docclass->endTag)(name);
- content = MIXED; /*@@ could be ELEMENT */
- }
- break;
-
- case SGML_end_tag:
- ... parse name ...
- (docclass->endTag)(name);
- content = MIXED; /*@@ could be ELEMENT */
- break;
-
- case SGML_entity:
- (docclass->entity)(data, name);
- break;
-
- default:
- (docclass->data)(dest, buffer);
- }
- */
-
- PlainText_parse(HText* dest, void* docclass, void* stream, int (getc)(void*));
- /* psuedocode:
- (docclass->startTag)(dest, "HTML");
- (docclass->startTag)(dest, "BODY");
- (docclass->startTag)(dest, "PRE");
- keep a local buffer of about 1000 chars.
- Call (getc)(stream) until EOF.
- Call HText_data(dest, buffer) whenever buffer is full.
- (docclass->endTag)(dest, "PRE");
- (docclass->endTag)(dest, "BODY");
- (docclass->endTag)(dest, "HTML");
- */
-
- GopherListing_parse(HText* dest, void* dummy, void* stream, int (getc)(void*));
- /* psuedocode:
- (docclass->startTag)(dest, "HTML");
- (docclass->startTag)(dest, "BODY");
- (docclass->startTag)(dest, "MENU");
- while(Gopher_parse_line(stream, getc, type, name, host, port, path)){
- char addr[BIG];
- sprintf(addr, "gopher://%s:%d/%c%s", host, port, type, path);
- (docclass->startTag)(dest, "A",
- "HREF", addr,
- 0);
- (docclass->data)(dest, name);
- (docclass->endTag)(dest, "A");
- }
- (docclass->endTag)(dest, "MENU");
- (docclass->endTag)(dest, "BODY");
- (docclass->endTag)(dest, "HTML");
- */
-
-
- We register each of these with the following routine:
-
- int
- ContentType_register(CONST char* type, CONST char* subtype,
- HTParseProc parse, void* closure);
-
- For example:
-
- main()
- {
- ContentType_register("TEXT", "X-HTML", HTML_parse, griddoc);
- ContentType_register("TEXT", "PLAIN", PlainText_parse, griddoc);
- ContentType_register("APPLICATION", "X-GOPHER",
- GopherListing_parse, griddoc);
- }
-
-
- The following routine can be used for any MIME entity. It will dispatch
- the appropriate parsing routine based on the content type header:
-
- int
- ContentType_parse(const char* ct, HText* dest, void* stream, int (getc)(void*));
-
-
- Then we build some load routines, one per access scheme:
- (note that this design separates format from the access scheme, which
- allows us to, for example, load a gopher menu
- from a local file, or load HTML text from a Gopher server)
-
- /* I don't have error handling worked out yet. We need to have a coherent
- design for this. It's a mess in the current WWWlib. */
-
- /* I think the WWW file: should be split into ftp: and local-file:.
- It's cleaner to implement; there are precedents in the MidasWWW local:
- scheme and the MIME ftp and local-file access-types. */
-
- int
- LocalFile_load(HText* dest, CONST char* path, CONST char* search)
- {
- FILE* stream;
-
- if(stream = fopen(path)){
- const char* content_type = WWW_zen_content_type_from_extension(path);
- ContentType_parse(content_type, dest, (void*)stream, (int ()(void*))getc);
- fclose(stream);
- return 1;
- }else{
- /* log an error */
- return 0;
- }
- }
-
- int
- FTP_load(HText* dest, CONST char* path, CONST char* search);
-
- int
- HTTP_load(HText* dest, CONST char* path, CONST char* search);
-
- int
- Gopher_load(HText* dest, CONST char* path, CONST char* search);
- {
- const char* content_type = Gopher_zen_content_type_from_gtype_char(*path);
- char* host = HTParse(path, PARSE_HOST);
- char* portnum = HTParse(path, PARSE_PORT);
- int port = atoi(portnum);
- static char* tab = "\007";
- static char* crlf = "\015\012";
-
- void* stream = TCPOpen(host, port);
-
- if(stream){
- TCPwrite(stream, path, strlen(path);
- if(search){
- TCPwrite(stream, tab, 1);
- TCPwrite(stream, search, strlen(search);
- }
- TCPwrite(stream, crlf, 2);
- ContentType_parse(content_type, dest, stream, TCPgetc);
- TCPclose(stream);
- return 1;
- }else{
- /* log an error */
- return 0;
- }
- }
-
-
- Then we register these just like formats:
-
- HTAccess_register(const char* name, HTLoadProc load, void* closure);
-
-
- And the HTLoadDocument routine in HTAccess.c becomes this:
-
- int
- HTAccess_load(HText* dest, HTParentAnchor* p, CONST char* address)
- {
- char* scheme = HTParse(address, PARSE_SCHEME);
- /* path is everything after the colon, except the anchor */
- char* path = HTParse(address, PARSE_HOST|PARSE_PORT|PARSE_PATH);
- char* anchor = HTParse(address, PARSE_ANCHOR);
- char* search = HTParse(address, PARSE_SEARCH_TERMS);
- HText dest = HText_new(p); /* check for doc already loaded in p @@ */
- void* closure;
- HTLoadProc load;
-
- if(load = /* load routine registered for scheme. find closure too */){
- (load)(dest, path, search, closure);
- }
- HTSelect(dest, anchor);
- }
-
-
- What do you think?
-
- Dan
-
-
-